aact_studies.tsvaact_drugs.tsvaact_descriptions.tsvaact_drugs_leadmine.tsvaact_drugs_smi_pubchem_cid.tsvaact_drugs_smi_pubchem_cid2inchi.tsvaact_drugs_inchi2chembl.tsvaact_drugs_chembl_activity_pchembl.tsvaact_drugs_chembl_target_component.tsvpharos_targets.tsvaact_descriptions_tagger_matches.tsvdiseases_entities.tsv
nct_idis the study ID.
## [1] "Mon Apr 8 15:44:27 2019"
library(readr)
library(data.table)
library(stringr)
library(plotly, quietly=T)
Read file of all studies in AACT.
## [1] "Total studies: 300214 ; unique NCT_IDs: 300214"
Reference type results_reference may offer greater evidence, confidence.
## [1] "references: 388031; NCT_IDs: 61208; PMIDs: 287758; results_references: 64880"
Read file of all drugs in AACT.
id is AACT ID.## [1] "Unique drug names: 91347 ; unique intervention IDs: 255077"
Select only Interventional studies (study_type) associated with drugs (via nct_id).
## [1] "Interventional studies: 237892 (79.2%)"
## [1] "Interventional drug studies: 124421 ; unique NCT_IDs: 124421"
| phase | N_studies | N_drugs |
|---|---|---|
| Early Phase 1 | 1574 | 2615 |
| Phase 1 | 23603 | 48593 |
| Phase 1/Phase 2 | 6663 | 13288 |
| Phase 2 | 33910 | 68850 |
| Phase 2/Phase 3 | 3305 | 6503 |
| Phase 3 | 22988 | 49507 |
| Phase 4 | 19593 | 36331 |
| NA | 12785 | 29390 |
| overall_status | N |
|---|---|
| Completed | 145006 |
| Recruiting | 33973 |
| Terminated | 19618 |
| Unknown status | 18463 |
| Active, not recruiting | 13962 |
| Not yet recruiting | 8001 |
| NA | 7080 |
| Withdrawn | 6969 |
| Enrolling by invitation | 1060 |
| Suspended | 945 |
(To do: stack with study start_year.)
## Warning: Ignoring 1 observations
## Warning: Ignoring 1 observations
AACT drug names resolved to standard names and structures via SMILES. Note that one name may include multiple chemicals. Now we can use cheminformatically rigorous counts for drugs as active pharmaceutical ingredients (APIs).
## [1] "Drug unique SMILES resolved by LeadMine: 4699 ; unique intervention IDs: 171741"
| smi2img | N_mentions | names |
|---|---|---|
| 2637 | Abraxane; PACLITAXEL; Paclitaxel; Taxol; abraxane; paclitaxel; taxol | |
| 2545 | CYCLOPHOSPHAMIDE; Ciclophosphamide; Cyclophosphamid; Cyclophosphamide; ciclophosphamide; cyclophosphamide | |
| 2461 | CISPLATIN; Cis-platinum; Cisplatin; Cisplatine; Cisplatinum; cis Platinum; cis-platinum; cisplatin; cisplatine; cisplatinum | |
| 2070 | DEXAMETHASONE; Dexamethason; Dexamethasone; Dexamethosone; Maxitrol; OZURDEX; Oradexon; Ozurdex; dexamethason; dexamethasone; dexamethosone | |
| 2054 | CARBOPLATIN; Carboplatin; Carboplatine; Paraplatin; carboplatin; carboplatine | |
| 1779 | DOCETAXEL; Docetaxel; docetaxel | |
| 1625 | METFORMIN; MetFORMIN; Metformin; Metformine; metformin; metformine | |
| 1540 | GEMCITABINE; Gemcitabine; gemcitabine | |
| 1342 | CAPECITABINE; Capecitabin; Capecitabine; XELODA; Xeloda; capecitabine; xeloda | |
| 1178 | Cortancyl; Lodotra; Meticorten; Prednison; Prednisone; RAYOS; prednison; prednisone | |
| 1157 | 0xaliplatin; Eloxatin; OXALIPLATIN; OXAliplatin; Oxaliplatin; Oxaliplatine; eloxatin; oxaliplatin; oxaliplatine | |
| 1157 | METHOTREXATE; Methotrexate; Metoject; methotrexate | |
| 1086 | BUPIVACAINE; Bupivacain; Bupivacaine; EXPAREL; Exparel; SKY0402; bupivacain; bupivacaine | |
| 1044 | ETOPOSIDE; Etoposid; Etoposide; etoposide | |
| 1027 | ADOPORT; ADVAGRAF; Adoport; Advagraf; ENVARSUS; Envarsus; FK-506; FK506; PROGRAF; Prograf; Protopic; TACROLIMUS; Tacrolimus; tacrolimus | |
| 978 | NORMAL SALINE; Normal Saline; Normal saline; normal salin; normal saline | |
| 977 | LIDOCAINE; LMX 4; LMX4; Lidocain; Lidocaine; Lidoderm; Lignocain; Lignocaine; Oraqix; lidocain; lidocaine; lignocaine | |
| 908 | CYTARABINE; Cytarabine; Cytosar; DepoCyt; DepoCyte; Depocyt; Depocyte; cytarabine; cytosar | |
| 903 | COPEGUS; Copegus; REBETOL; RIBAVIRIN; Rebetol; Ribasphere; Ribavarin; Ribavirin; Ribavirine; Virazole; rebetol; ribavarin; ribavirin | |
| 846 | Diprivan; PROPOFOL; Propofol; propofol |
## [1] "Drugs (drug names) with resolved structure: 180555 / 197300 (91.5%)"
## [1] "Mentions by intervention ID: 157862 / 171741 (91.9%)"
## [1] "Mentions by study: 92966 / 99647 (93.3%)"
## [1] "Mentions by drug name: 11108 / 58297 (19.1%)"
## [1] "PubChem SMILES2CID hits: 3960 / 4698 (84.3%)"
## [1] "Intervention IDs mapped to PubChem CIDs (via SMILES): 153876"
## [1] "PubChem CIDs with InChIKeys: 3801"
## [1] "ChEMBL compounds mapped via InChIKeys: 3332"
Select only activities with pChembl values for confidence.
## [1] "ChEMBL activities: 124438"
## [1] "ChEMBL activities molecules: 2287 ; targets: 3832 ; documents: 16198"
## [1] "ChEMBL target proteins: 3157"
## [1] "ChEMBL target proteins mapped to TCRD (human): 1806"
## [1] "Organisms: 187"
| organism | N_targets |
|---|---|
| Homo sapiens | 1806 |
| Rattus norvegicus | 529 |
| Mus musculus | 238 |
| Bos taurus | 98 |
| Sus scrofa | 36 |
| Cavia porcellus | 26 |
| Escherichia coli K-12 | 19 |
| Oryctolagus cuniculus | 18 |
| Escherichia coli | 17 |
| Mycobacterium tuberculosis | 17 |
## [1] "Human targets: 1806"
| target_type | N |
|---|---|
| SINGLE PROTEIN | 1216 |
| PROTEIN COMPLEX | 247 |
| PROTEIN FAMILY | 210 |
| PROTEIN COMPLEX GROUP | 91 |
| PROTEIN-PROTEIN INTERACTION | 16 |
| SELECTIVITY GROUP | 14 |
| CHIMERIC PROTEIN | 12 |
## [1] "Human single-protein targets: 1216 ; unique UniProts: 1216"
## [1] " Tchem: 733" " Tclin: 341" " Tbio: 140"
## [4] " Tdark: 2"
With JensenLab DOID entities dictionary. On descriptions from detailed_descriptions table.
serialno corresponds with DOID.id is AACT primary key.Likely false positives, manually removed:
| doid | N_mentions | terms |
|---|---|---|
| DOID:4 | 76402 | DISEASE; Disease; dis- ease; dis-ease; disease |
| DOID:162 | 28596 | CANCER; CANcer; Cancer; Malignant Tumor; Malignant neoplasm; Malignant tumor; Primary Cancer; Primary cancer; cancer; malignant Tumor; malignant neoplasm; malignant tumor; primary cancer |
| DOID:9351 | 17274 | DIABETES; DIABETES MELLITUS; DIAbetes; DIabetes; Diabetes; Diabetes Mellitus; Diabetes mellitus; diabetes; diabetes Mellitus; diabetes mellitus; diabetes-mellitus |
| DOID:6713 | 16632 | CVA; Cerebrovascular Accident; Cerebrovascular Disease; Cerebrovascular accident; Cerebrovascular disease; STROKE; STRokE; Stroke; cerebro- vascular disease; cerebro-vascular disease; cerebrovascul… |
| DOID:2030 | 12084 | ANXIETY; Anxiety; Anxiety Disorder; Anxiety state; anxiety; anxiety disorder; anxiety state; anxiety syndrome; anxiety-state |
| DOID:1612 | 10583 | BREAST CANCER; BReast CAncer; BReast Cancer; Breast Cancer; Breast cancer; Breast tumor; Breast-cancer; Primary breast cancer; breast Cancer; breast caNcEr; breast cancer; breast tumor; breast-canc… |
| DOID:2841 | 10021 | ASTHMA; Asthma; BHR; Bronchial hyper-reactivity; Bronchial hyperreactivity; EIA; Exercise-induced asthma; asthma; bronchial hyper reactivity; bronchial hyper-reactivity; bronchial hyperreactivity; … |
| DOID:3083 | 9782 | CHRONIC OBSTRUCTIVE PULMONARY DISEASE; COLD; COPD; COPd; Chronic Obstructive Lung Disease; Chronic Obstructive Lung disease; Chronic Obstructive Pulmonary Disease; Chronic Obstructive Pulmonary dis… |
| DOID:9970 | 9303 | OBESITY; OBesity; Obesity; obEsity; obe-sity; obesity |
| DOID:10763 | 9144 | HBP; HTN; HYPERTENSION; High Blood Pressure; High blood pressure; High-blood pressure; Hypertension; Hypertensive disease; high blood Pressure; high blood pressure; high blood-pressure; htn; hyper-… |
| DOID:3393 | 6816 | C-HD; CAD; CHD; CORONARY ARTERY DISEASE; CORONARY SYNDROME; CORONARY syndrome; ChD; Coronary ARtery DIsease; Coronary Artery Disease; Coronary Disease; Coronary Heart Disease; Coronary Heart diseas… |
| DOID:0060145 | 6115 | ANALGESIA; Analgesia; analgeSia; analgesia |
| DOID:9352 | 5848 | Diabetes Mellitus Type 2; Diabetes Mellitus Type II; Diabetes Mellitus type 2; Diabetes Mellitus, Type II; Diabetes mellitus Type 2; Diabetes mellitus non-insulin-dependent; Diabetes mellitus type … |
| DOID:10283 | 5056 | Familial Prostate Cancer; HPC; PRostate Cancer; Prostate CAncer; Prostate Cancer; Prostate cancer; Prostatic cancer; hereditary prostate cancer; prostate Cancer; prostate cancer; prostate-cancer; p… |
| DOID:8469 | 4985 | FLU; Flu; Influenza; flu; influenza |
| DOID:225 | 4962 | SYNDROME; Syndrome; syn drome; syndrome |
| DOID:3908 | 4959 | NSCLC; Non Small Cell Lung Cancer; Non Small Cell Lung Carcinoma; Non Small Cell Lung cancer; Non small cell lung cancer; Non small-cell lung cancer; Non- small cell lung cancer; Non-Small Cell Lun… |
| DOID:784 | 4841 | CKD; CKF; CRD; CRF; Chronic Kidney Disease; Chronic Kidney disease; Chronic Kidney failure; Chronic Renal Disease; Chronic kidney disease; Chronic kidney failure; Chronic renal disease; chronic Kid… |
| DOID:5419 | 4689 | SCHIZOPHRENIA; Schizophrenia; schizophrenia |
| DOID:684 | 3836 | HCC; HEPATOCELLULAR CARCINOMA; Hepatocellular Carcinoma; Hepatocellular carcinoma; Hepatoma; hcc; hepato-cellular carcinoma; hepatocellular Carcinoma; hepatocellular carcinoma; hepatoma |
Sort synonyms terms by frequency.
| nct_id | doid | N_mentions | disease_terms |
|---|---|---|---|
| NCT00000102 | DOID:0050811 | 1 | congenital adrenal hyperplasia |
| NCT00000105 | DOID:11338 | 6 | tetanus;Tetanus |
| NCT00000113 | DOID:11830 | 14 | myopia;Myopia;nearsightedness |
| NCT00000113 | DOID:9835 | 1 | refractive error |
| NCT00000113 | DOID:1432 | 1 | blindness |
| NCT00000114 | DOID:10584 | 1 | Retinitis pigmentosa |
| NCT00000114 | DOID:8499 | 1 | night blindness |
| NCT00000114 | DOID:8466 | 1 | retinal degeneration |
| NCT00000114 | DOID:4 | 1 | disease |
| NCT00000115 | DOID:4447 | 5 | cystoid macular edema |
| NCT00000115 | DOID:13141 | 5 | uveitis;Uveitis |
| NCT00000115 | DOID:1686 | 2 | glaucoma |
| NCT00000115 | DOID:4 | 2 | disease |
| NCT00000115 | DOID:8947 | 1 | Diabetic Retinopathy |
| NCT00000115 | DOID:1432 | 1 | visual impairment |
| NCT00000115 | DOID:83 | 1 | cataract |
| NCT00000116 | DOID:4 | 2 | disease |
| NCT00000116 | DOID:10584 | 1 | Retinitis pigmentosa |
| NCT00000116 | DOID:8499 | 1 | night blindness |
| NCT00000117 | DOID:8947 | 1 | Diabetic Retinopathy |
And include references.
Keep only studies including both disease and drug mentions.
## [1] "studies linked to 1+ drugs AND 1+ diseases: 39070"